Submitted. 


TECHNICAL  REPORT 
R-366 

October  2011 


Measurement  Bias  and  Effect  Restoration  in  Causal  In¬ 
ference 


Manabu  Kurokif 

The  Institute  of  Statistical  Mathematics,  Tachikawa,  Tokyo,  Japan 
Judea  Pearl 

University  of  California,  Los  Angeles,  Los  Angeles,  CA,  USA 


Summary. 

This  paper  highlights  several  areas  where  graphical  techniques  can  be  harnessed  to  address 
the  problem  of  measurement  errors  in  causal  inference.  In  particular,  the  paper  discusses  the 
control  of  partially  observable  confounders  in  parametric  and  non  parametric  models  and  the 
computational  problem  of  obtaining  bias-free  effect  estimates  in  such  models. 
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1.  Introduction 

This  paper  discusses  methods  of  dealing  with  measurement  errors  in  the  context  of  graph-based 
causal  inference.  It  is  motivated  by  a  powerful  result  reported  in  Greenland  and  Lash  (2008)  which 
is  rooted  in  classical  regression  analysis  (Greenland  and  Kleinbaum,  1983;  Selen,  1986;  Carroll  et 
al.,  2006),  but  has  not  been  fully  utilized  in  causal  analysis  or  graphical  models. 

Let  pr(-u)  be  the  joint  distribution  of  V  =  (Vi,  •  •  • ,  Vn)  =  (vi,  •  ■  ■  ,vn ),  pr(vi\v3)  the  condi¬ 
tional  distribution  of  Vi  =  v,  given  V3  =  v3  and  pr(  v;,)  the  marginal  distribution  of  V,  =  v, .  Similar 
notation  is  used  for  other  distributions.  For  the  graph-theoretic  terminology  used  in  this  paper,  we 
refer  readers  to  Pearl  (1988,  2009). 

Given  a  directed  acyclic  graph  G  =  ( V,  E)  with  a  set  V  of  variables  and  a  set  E  of  arrows,  a 
probability  distribution  pr(-u)  is  said  to  be  compatible  with  G  if  it  can  be  factorized  as: 

n 

pr(u)  =  II  pr{u,:|pa(u,)},  (1) 

l—l 
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where  pa(ui)  is  a  set  of  parents  of  V,.  When  pa(ui)  is  an  empty  set,  pr{ui|pa(ui)}  is  the  marginal 


distribution  pr(u;).  When  equation  (1)  holds,  we  also  say  that  G  is  a  Bayesian  network  of  pr(-u) 


(Pearl,  2009,pp.l3-16). 

If  a  joint  distribution  is  factorized  recursively  according  to  the  graph  G,  then  the  conditional 
independencies  implied  by  the  factorization  (1)  can  be  obtained  from  the  graph  G  according  to  the 
tZ-separation  criterion  (Pearl,  1988).  That  is,  for  any  distinct  subsets  X,  Y  and  Z .  if  Z  cZ-separates 
X  from  Y  in  G,  then  X  is  conditionally  independent  of  Y  given  Z .  denoted  as  X  JJ_  Y\Z,  in 
every  distribution  satisfying  equation  (1). 

If  every  parent-child  family  in  the  graph  G  stands  for  an  independent  data-generating  mecha¬ 
nism,  the  Bayesian  network  is  called  a  causal  diagram  (see  Pearl,  2009,  p.24,  for  formal  definition). 
Based  on  a  causal  diagram  G,  for  any  X.  Ye  V,  the  causal  effect  of  X  on  Y  is  defined  as 


where  do(x)  indicates  that  X  is  fixed  to  x  by  an  external  intervention  (Pearl,  2009).  When  the 
causal  effect  can  be  determined  uniquely  from  a  joint  distribution  of  observed  variables,  the  causal 
effect  is  said  to  be  identifiable.  The  most  common  identifiability  condition  that  can  be  obtained 
from  the  graph  structure  is  the  back  door  criterion.  A  set  S  of  variables  is  said  to  satisfy  the  back 
door  criterion  relative  to  an  ordered  pair  of  variables  (X,  Y)  if  (i)  no  vertex  in  S'  is  a  descendant  of 
X,  and  (ii)  S  d-separates  X  from  Y  in  the  graph  obtained  by  deleting  from  a  graph  G  all  arrows 
emerging  from  X.  If  any  such  set  can  be  measured,  the  causal  effect  of  X  on  Y  is  identifiable  and  is 
given  by  the  formula  pr{y|do  0*0}  =  ^  pr(y\x,  s)pr(s)  (Pearl,  2009,  pp. 79-80);  S  is  then  called 


sufficient. 


With  the  preparation  above,  we  consider  the  problem  of  estimating  the  causal  effect  of  X  on 
Y  when  a  sufficient  conl’ounder  U  is  unobserved,  and  can  only  be  measured  with  error  (see  Fig.l), 
via  a  proxy  variable  W.  In  Fig.l,  U  satisfies  the  back  door  criterion  relative  to  an  ordered  pair 
of  variables  (X,  Y),  but  its  proxy  variable  W  does  not.  Since  U  is  sufficient,  the  causal  effect  is 
identifiable  from  measurement  on  X,  Y  and  U,  and  can  be  written  as 


(2) 


U 
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Fig.l:  Needed  the  causal  effect  of  X  on  V  when  U  is  unobserved,  and  W  provides  a  noisy  mea¬ 
surement  of  U. 

However,  since  U  is  unobserved  and  W  is  but  a  noisy  measurement  of  U,  cZ-separation  tells  us 
immediately  that  adjusting  for  W  is  inadequate,  for  it  leaves  the  back  door  path(s)  X <—U—tY 
unblocked.  Therefore,  regardless  of  sample  size,  the  causal  effect  of  X  on  V  cannot  be  estimated 
without  bias.  It  turns  out,  however,  that  if  we  are  given  the  conditional  distribution  pr(w|u)  that 
governs  the  error  mechanism,  we  can  perform  a  modified-adjustment  for  W  that,  in  the  limit  of 
very  large  sample,  would  amount  to  the  same  thing  as  observing  and  adjusting  for  U  itself,  thus 
rendering  the  causal  effect  identifiable.  The  possibility  of  removing  bias  by  modified  adjustment  is 
far  from  obvious,  because,  although  pr( w  |u)  is  assumed  to  be  given,  the  actual  value  of  a  confounder 
U  remains  uncertain  for  each  measurement  W  =  w,  so  one  would  expect  to  get  either  a  distribution 
over  causal  effects,  or  bounds  thereof.  Instead,  we  can  actually  get  a  repaired  point  estimate  of 
pr{t/|do(x)}  that  is  asymptotically  unbiased. 

This  result,  which  we  will  label  effect  restoration ,  has  powerful  consequences  in  practice  be¬ 
cause,  when  pr(w|u)  is  not  given,  one  can  resort  to  a  Bayesian  (or  bounding)  analysis  and  as¬ 
sume  a  prior  distribution  (or  bounds)  on  the  parameters  of  pr (w  \  u)  which  would  yield  a  distribution 
(or  bounds)  over  pr{j/|do(a;)}  (Greenland,  2005).  Alternatively,  if  costs  permit,  one  can  estimate 
pr(w|w)  by  re-testing  U  in  a  sampled  subpopulation  (Carroll  et  al.,  2006).  This  is  normally  done 
by  re-calibration  techniques  (Greenland  and  Lash,  2008),  called  a  validation  study,  in  which  U  is 
measured  without  error  in  a  subpopulation  and  used  to  calibrate  the  estimates  in  the  main  study 
(Selen,  1986). 

On  the  surface,  the  possibility  of  correcting  for  measurement  bias  seems  to  undermine  the  im¬ 
portance  of  accurate  measurements.  It  suggests  that  as  long  as  we  know  how  bad  our  measurements 
are  there  is  no  need  to  improve  them  because  they  can  be  corrected  post-hoc  by  analytical  means. 
This  is  not  so.  First,  although  an  unbiased  effect  estimate  can  be  recovered  from  noisy  measure- 
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ments,  sampling  variability  increases  substantially  with  error.  Second,  even  assuming  unbounded 
sample  size,  the  estimate  will  be  biased  if  the  postulated  pr(w|u)  is  incorrect.  In  extreme  cases, 
wrongly  postulated  pr(w|u)  may  even  conflict  with  the  data,  and  no  estimate  will  be  obtained.  For 
example,  if  we  postulate  a  non  informative  W,  pr(w|u)  =  pr(w),  and  we  find  that  W  strongly 
depends  on  X,  a  contradiction  arises  and  no  effect  estimate  will  emerge. 

Effect  restoration  can  be  analyzed  from  either  a  statistical  or  causal  viewpoint.  Taking  the 
statistical  view,  one  may  argue  that,  once  the  causal  effect  is  identified  in  terms  of  a  latent  variable  U 
and  given  the  estimand  in  equation  (2),  the  problem  is  no  longer  one  of  causal  inference,  but  rather  of 
regression  analysis,  whereby  the  regressional  expression  Eu{pr(y\x,  w)}  need  to  be  estimated  from 
a  noisy  measurement  of  U,  given  by  W.  This  is  indeed  the  approach  taken  in  the  vast  literature  on 
measurement  error  (e.g.,  Selen,  1986;  Carroll  et  al.,  2006). 

The  causal  analytic  perspective  is  different;  it  maintains  that  the  ultimate  purpose  of  the  analysis 
is  not  the  statistics  of  X,  Y,  and  U,  as  is  normally  assumed  in  the  measurement-error  literature,  but 
the  causal  effect  pr{y|do(x)}  that  is  mapped  into  regression  vocabulary  only  when  certain  causal 
assumptions  are  deemed  plausible.  Awareness  of  these  assumptions  should  shape  the  way  we  deal 
with  measurement  error.  For  example,  the  very  idea  of  modeling  the  error  mechanism  pr(w|u)  re¬ 
quires  causal  considerations;  errors  caused  by  noisy  measurements  are  fundamentally  different  from 
those  caused  by  noisy  agitators.  Likewise,  the  reason  we  seek  an  estimate  pr(w|u)  as  opposed  to 
pr(w|w),  be  it  from  scientific  judgments  or  from  pilot  studies,  is  that  we  consider  the  former  to  be 
a  more  reliable  and  transportable  parameter  than  the  latter.  Transportability  (Pearl  and  Bareinboim, 
2011)  is  a  causal  notion  that  is  hardly  touched  upon  in  the  measurement-error  literature,  where 
causal  vocabulary  is  usually  avoided  and  causal  relations  relegated  to  informal  judgment  (e.g..  Car- 
roll  et  al.,  2006,  pp. 29-32). 

Viewed  from  this  perspective,  the  measurement-error  literature  appears  to  be  engaged  (unwit¬ 
tingly)  in  causal  considerations  that  can  benefit  from  making  the  causal  framework  explicit.  The 
benefit  can  in  fact  be  mutual;  identifiability  with  partially  specified  causal  parameters  (as  in  Fig.l)  is 
rarely  discussed  in  the  causal  inference  literature  (notable  exceptions  are  Goetghebeur  and  Vanstee- 
landt  (2005),  Cai  and  Kuroki  (2008),  Hernan  and  Cole  (2009)  and  Pearl  (2010)),  while  graphical 
models  are  hardly  used  in  the  measurement-error  literature. 
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In  this  paper,  we  will  consider  the  mathematical  aspects  of  effect  restoration  and  will  focus  on 
asymptotic  analysis.  Our  aims  are  to  understand  the  conditions  under  which  effect  restoration  is 
feasible,  to  assess  the  computational  problems  it  presents,  and  to  identify  those  features  of  pr(w|u) 
and  prf.'/-.  y.  w)  that  are  major  contributors  to  measurement  bias. 

2.  Effect  Restoration  with  External  Studies 

2. 1 .  Matrix  Adjustment  Method 

Guided  by  the  graph  shown  in  Fig.l,  we  start  with  the  joint  probability  pr(.c.  y.  w .  uj  and  assume  that 
W  depends  only  on  U,  i.e.,  pr(w|:r,  y,  u )  =pr(w|u).  This  assumption  is  often  called  non-differential 
error  (Carroll  et  al.,  2006). 

We  further  assume  that: 

(a)  the  distribution  pr(w|u)  of  the  error  mechanism  are  available  from  external  studies  such  as 
pilot  studies  or  scientific  judgments,  and 

(b)  the  confounder  U  is  a  discrete  variable  with  a  given  finite  number  of  categories,  while  X,  Y 
and  W  may  be  continuous  or  discrete,  as  long  as  the  number  of  categories  of  W  is  greater  or 
equal  to  that  of  U. 

The  main  idea  of  recovering  pr(a;,  y,  u )  from  both  pr(a:,  y ,  w)  and  pr(w  | u),  adapted  from  Greenland 
and  Lash  (2008,  p.360)  and  Pearl  (2010),  is  as  follows:  for  U  and  W  such  that  u  E  {ui,  ...Uk}  and 
w  E  {wi,  ...,Wk},  we  have 

k 

pr  (x,y,w)  =  ^2pr(x,y,Ui)pr(w\ui),  (3) 

i—  1 

Then,  for  any  specific  values  x  and  y,  letting 


(  pr(.T,  y,ui)  \ 

/  pr(x,y,  wi)  \ 

/  pr(wi|rti)  • 

•  pr(wi| uk)  \ 

^xyiu)  — 

5  Vxy  (^)  — 

,  M(w,  u )  = 

\  pr (x,y,Uk)  y 

\  pr {x,y,wk)  y 

^  pr(wfc|ui)  • 

■  pr(wk\uk )  y 

equations  (3)  can  be  written  as  matrix  multiplication: 

Vxy(w)  =  M(w,u)Vxy(u).  (4) 


Now,  assuming  that 
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(c)  M(w,  u)  is  invertible, 

the  elements  pr(:r,  y,  u)  of  Vxy  (u)  are  estimable  and  are  given  by 


Vx  y(u)  =  I(w,u)Vxy(w), 


(5) 


wher  el(w,u)  =  Thus,  equation  (5)  enables  us  to  reconstruct  pr(x,  y,  u)  from  pr(a;,  y,  w). 

In  other  words,  each  term  on  the  right  hand  side  of  equation  (2)  can  be  obtained  from  pr(.r:.  y.  w ) 
and  pr(w|tt)  through  equation  (5)  and,  assuming  U  is  sufficient,  the  causal  effect  pr{j/|do(a;)}  is  es¬ 
timable  from  the  available  data.  Explicitly,  letting  i(w,u )  be  the  corresponding  element  of  ! (w.  u) 
that  corresponds  to  (W,  U)  =  (■ w,u ),  we  have: 

k  k. 

'^2ii{wj,ui)pr(x,y,wj)'^2i(wj,ui)pr(wj) 

- - * - — - ^ 

^2i(Wj,Ul)pT(x,Wj) 

3=1 

Note  that  the  same  inverse  matrix,  I(w,u )  appears  in  all  summations. 

When  we  do  not  assume  independent  noise  mechanisms,  this  will  not  be  the  case.  In  other 
words,  if  pr(w|x,  y,  u )  =pr (w\u)  does  not  hold,  we  must  write: 


pr{y|do(x)}  =  ^ 


i= 1 


pr(x,y,Ui)pr(ui) 
pr  (x,Ui) 


=  E 


i=i 


k 

pr  (x,y,w)  =  'Y^pr(w\x,y,Ui)pv{x,y,ui), 

2—1 

which  can  be  transformed  to  matrix  multiplication  Vxy(w )  =  Mxy(w,  u)Vxy(u),  where 

(pr(wi\y,x,ui)  ■■■  pr(wi|  y,x,uk) 

:  • 
pr(wk\y,x,u{)  •••  pr(wk\y,x,uk) 


and  its  inverse  Ixy  (w,  u )  are  both  indexed  by  the  specific  values  of  x  and  y.  Thus,  when  both  X  and 
Y  are  discrete  variables  with  a  given  finite  number  of  categories,  we  obtain: 


—  Ixy  (tU,  (ttt)  (7) 

which,  again,  permits  the  identification  of  the  causal  effect  via  equation  (2)  except  that  the  expres¬ 
sion  becomes  somewhat  more  complicated.  It  is  also  clear  that  errors  in  the  measurement  of  X  and 
Y  can  be  absorbed  into  W,  and  do  not  present  any  conceptual  problem. 
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Equation  (6)  demonstrates  the  feasibility  of  effect  reconstruction  and  proves  that,  despite  the 
uncertainty  in  the  variables  A",  Y  and  U,  the  causal  effect  is  identifiable  once  we  know  the  statistics 
of  the  error  mechanism. 

This  result  is  asymptotic,  and  presents  practical  challenges  of  computation  and  estimation.  In 
particular,  one  needs  to  address  the  problem  of  empty  cells  which,  owed  to  the  high  dimensionality 
of  W  and  U  would  prevent  us  from  getting  reliable  statistics  of  pr(.r,  y.  w),  as  required  by  equation 
(6).  When  X  is  a  binary  variable,  one  can  resort  then  to  propensity  score  (PS)  methods  (Rosenbaum 
and  Rubin,  1983),  which  map  the  cells  of  U  onto  a  single  scalar. 

The  error-free  propensity  score  L(u)  =  pr(:r:  i  |w)  being  a  functional  of  pr(x,  y.  u)  can  be  esti¬ 
mated  consistently  from  samples  of  pr(.r.  y.  ns)  using  the  transformation  (5).  Explicitly,  we  have: 


L(u)  =  pr(xi  |  u) 


prQci ,  u) 
pr  (u) 


where  pr(xi ,  u)  and  pr(rt)  are  given  in  equations  (5). 

Using  the  decomposition  in  pr(u>|a;,  y,  u)=pr(w|u),  we  can  further  write: 


L(u)  = 


k  k 

YJi(u>j,u)w{xuWj)  Y  i(wj,u)L(w)pr(wj) 

j= i _  _  j= i _ 

k  k 

Y  i(wj ,  u)pr(wj)  Y  .  u)Pr(wi) 


(8) 


j= 1  j=l 

where  L(w)  is  the  error-prone  propensity  score  L(w)  =  pr(xi  wj).  We  see  that  L(u)  can  be  com¬ 
puted  from  I(w,  u),  L(w)  and  pr(w).  Thus,  if  we  succeed  in  estimating  these  three  quantities  in  a 
parsimonious  parametric  form,  the  computation  of  L(u)  would  be  hindered  only  by  the  summations 
called  for  in  equation  (5).  Once  we  estimate  L(w)  parametrically  for  each  conceivable  w,  equation 
(8)  permits  us  to  assign  to  each  tuple  u  a  bias-less  score  L(u)  that  correctly  represents  the  probabil¬ 
ity  of  X  =  i  given  U  =  u.  This,  in  turn,  should  permit  us  to  estimate,  for  each  stratum  TAo)  =  l, 
the  probability 


pr(0 


then  compute  the  causal  effect  using 


Y  pr(u')’ 

u \L(u)=l 


Pr{t/|do(a;)}  =  YPT(y\x>l)PT(1)- 
l 
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One  technique  for  approximating  pr(i)  was  proposed  by  Stunner  et  al.  (2005)  and  Schneeweiss 
et  al.  (2009),  which  did  not  make  full  use  of  the  inversion  in  equation  (8)  or  of  graphical  methods 
facilitating  this  inversion.  A  more  promising  approach  would  be  to  construct  pr(/)  and  pr  (y\:r.  I) 
directly  from  synthetic  samples  of  pr(x.  y.  u)  that  can  be  created  to  mirror  the  empirical  samples  of 
pr(.T,  y,  w).  This  is  illustrated  in  the  next  subsection,  using  binary  variables. 

2.2.  Matrix  Adjustment  in  Binary  Models 

Let  X,  Y,  W  and  U  be  binary  variables,  w£{iu i ,  w;L},  uG{u] .  in},  and  let  the  noise  mechanism  be 
characterizes  by  pr(w2  |ui)  and  pr(wi  \u2).  Then,  equation  (5)  translates  to 


(9) 


which  represents  the  inverse  matrix 


Metaphorically,  the  transformation  in  equation  (9)  can  be  described  as  a  mass  re-assignment  pro¬ 
cess,  as  if  every  two  cells,  (x,y,w\)  and  (x,y,W2),  compete  on  how  to  split  their  combined  weight 
pr {x,y)  between  the  two  latent  cells  (x,y,u,\)  and  ( x,y,U2 )  thus  creating  a  synthetic  population 
pr (x,y,u)  governed  by  equation  (6).  Fig. 2  describes  how  pr(wi  \x,  y),  the  fraction  of  the  weight 
held  by  the  (x,  y,w i )  cell  determines  the  ratio  pr(ui \x,  y) /pr{u2  \x,  y)  that  is  eventually  received  by 
cells  (x,  y,u\)  and  (x,  y,uo)  respectively.  We  see  that  when  pr(wi|:r,  y)  approaches  1— pr(w2|wi), 
most  of  the  pr(x,  y)  weight  goes  to  cell  (x,  y,  u\),  whereas  when  pr(wi \x,  y)  approaches  pr(wi \u2), 
most  of  that  weight  goes  to  cell  (x,  y,  U2 )■ 

Clearly,  when  pr(w2|ui)  +  pr(wi|u2)  =  1,  or  U  JJ_  W,  W  provides  no  information  about  U 
and  the  inverse  does  not  exist.  Likewise,  whenever  any  of  the  synthetic  distributions  pr (x,y,u) 
falls  outside  the  (0, 1)  interval,  a  modeling  constraint  is  violated  (see  Pearl  (1988,  Chapter  8)) 
meaning  that  the  observed  distribution  pr(x,  y,  w)  and  the  postulated  error  mechanism  pr(w|u)  are 
incompatible  with  the  structure  of  Fig.  1.  If  we  assign  reasonable  priors  to  pr(w2  |mi)  and  pr(wi \ii2), 
the  linear  function  in  Fig. 2  will  become  an  S-shaped  curve  over  the  entire  [0, 1]  interval,  and  each 
sample  (x,  y,  w)  can  then  be  used  to  update  the  relative  weight  pr(a;,  y,  ui)/pr(x,  y,  112). 
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pr(w,  |x,>>) 


Fig.2:  A  curve  describing  how  the  weight  pr(x,  y )  is  distributed  to  cells  (x,  y,  u\)  and  (x,  y,  u2),  as 
a  function  of  pr(wi  \x,  y). 


To  compute  the  causal  effect  pr{t/|do(x)}  ,  we  need  only  substitute  pr (x,y,u)  from  equation 
(9)  into  equation  (2)  or  (6),  which  gives 


Pr{t/|do(a;)} 


/  _  pr(wi  |mq)  L  _  pr(wi|u2)\ 
pr (x,y,W!)  V  pr(wi] x,y)J  \  pr(wi)  ) 

pr(a:|wi)  _  pr(wi  |tt2)pr(x) 

pr(wi) 

/  _  pr(w2|m)  \  /  _  pr(wo \ui)  \ 
pr (x,y,w2)  V  pr{w2\x,y))  \  pr(w2)  ) 
pr(z| w2)  1  _  pr(w2 |tti)pr(x) 

pr(w2) 


(10) 


This  expression  highlights  the  difference  between  the  standard  and  modified  adjustment  for  TTr; 
the  former  (equation  (2)),  which  is  valid  if  W  =  U,  is  given  by  the  standard  inverse  probability 
weighting  (e.g..  Pearl,  2009,  equation  (3.11)): 


Pr{y|do(.x)} 


pr{x,y,wi)  pr{x, y,w2) 
pr(x|wi)  pr(a:|w2) 


The  extra  factors  in  equation  (10)  can  be  viewed  as  modifiers  of  the  inverse  probability  weight 
needed  for  a  bias-free  estimate.  Alternatively,  these  terms  can  be  used  to  assess,  given  pr(  w2 1?/  | ) 
and  pr(w’i  |r/2),  what  bias  would  be  introduced  if  we  ignore  errors  altogether  and  treat  IT  as  a 
faithful  representation  of  U.  When  both  pr(«;2  |i/  | )  1  and  pr(w,'  i  |?/2)  <C  1  hold,  the  first-order 
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Fig. 3:  A  causal  model  with  two  proxy  variables  of  U,  permitting  the  identification  of  pr{y  |do 


approximation  of  equation  (10)  reads: 


pr(wi) 

,  pr{x,  y,w2)  f,  ,  ,  v  (  1  1  ~  pr(aQ 

pr(x\w2)  \  2  1  \pr{w2\x,y)  pr(w2) 


)}• 


We  see  that,  even  with  two  error  parameters  (i.e.  pr(w;2 \uj )  and  pr(wi|u2)X  and  eight  cells,  the 
expression  for  pr{i/|do(x)}  does  not  simplify  to  provide  an  intuitive  understanding  of  the  effect 
of  pr(w2|ui)  and  pr(wi  |w,2)  on  the  estimand.  Such  evaluation  will  be  facilitated  in  linear  models 
(Section  4). 

Assuming  now  that  U  is  a  sufficient  set  of  K  binary  variables  and,  similarly,  W  is  a  set  of  K 
local  indicators  of  U  satisfying  equation  pr(iu|u)  =pr(w| x,y,u).  Each  samples  ( x,y,w )  should 
give  rise  to  a  synthetic  distribution  over  the  2A  cells  of  (x.  y.  u)  given  by  a  product  of  K  local 
distributions  in  the  form  of  equation  (9).  This  synthetic  distribution  can  be  sampled  to  generate 
synthetic  (x,y,u)  samples,  from  which  the  true  propensity  score  L(u)  =pr(xi|w)  as  well  as  the 
causal  effect  pr{y|do(.x)}  can  be  estimated,  as  discussed  in  Section  2.1. 


3.  Effect  Restoration  without  External  Studies 


In  this  section,  we  will  tackle  the  more  difficult  problem  of  estimating  causal  effects  without  prior 
knowledge  of  the  noise  statistics.  We  will  show  that,  under  certain  conditions,  causal  effects  can  be 
restored  from  proxy  measurements  alone. 

Consider  a  causal  diagram  shown  in  Fig.  3  which  is  obtained  by  adding  an  observed  variable  Z 
to  Fig.  1.  We  will  first  show  that  pr(x,  y,  u )  can  be  recovered  from  pr(x,  y,  z,w )  under  the  following 
conditions: 


(a)  two  proxy  variables  of  U  which  are  conditionally  independent  of  each  other  given  U  can  be 
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observed  (e.g.  W  and  Z  in  Fig.3),  and  U  satisfies  both  W  |J_  {X,  Y,  Z}\U  and  Y  |J_  {W,Z}\{U,X}, 
as  in  Fig.3. 

(b)  the  confounder  U  is  a  discrete  variable  with  a  given  finite  number  of  categories,  while  X,  Y, 

Z,  and  W  may  be  continuous  or  discrete,  as  long  as  the  number  of  categories  of  W  and  Z  is 
greater  or  equal  to  that  of  U. 


To  show  that,  we  first  rearrange  pr(y\x,  ui), ...,  pr(y\x,  Uk)  in  decreasing  order  and  relabel  {ui,  ...,Uk} 
as  {«(!),...,  such  that  pr(y\x,  «(i))>  ■  ■■>pr(y\x,U(k'))  for  a  given  x  and  y,  and,  then,  we  re¬ 
cover  pr(x,  y,  u )  from  pr(x,  y,z,w)  using  eigenvalue  analysis. 


From  Fig.3,  with  U,  W  and  Z  taking  on  values,  u  G  {ui,  ,...uk}  =  {«n),  ,...U(k)},  w  G 

{wi, ...,  wk}  and  2  G  {zi, ...,  zk}  respectively,  we  have 

"d 

A 

ii 

k  k 

Y  pr(w\ui)pr(z\x,  ut) pr(u,;|x)  =  Y  pr(«t|tt(4))pr(^|x,  u(i))pr(u{i)  |x), 

2= 1  2=1 

£ 

A 

II 

k  k 

Pr(w  \ui)pr(y  |x,  rtj)pr(rt,-|x)  =  ^pr(w|u(i))pr(t/|x,tt(i))pr(tt(i)|x), 

2  —  1  2  —  1 

pr(y,z\x)  = 

k  k 

Y  w(y\xi Ui)pr{z\x,  Ui)pr{ui\x)  =  Y  Pr(2/|tc,  uW)pr(^|x,  u(i))pr(u(i}  |x), 

2  —  1  2=1 

pr  (y,z,w\x)  = 

k 

^pr(w|tti)pr(^|x,Mi)pr(j/|x,Mi)pr(ui|x) 

2  =  1 

= 

k 

^pr(w|rtw)pr(2:|x,MW)pr(2/|x,'U(i))pr(rtw|x). 

Denote  by  P(z,  w),  Q(z,w),  U ( w,u )  and  R(z,u)  the  following  matrices: 

P(z,w )  = 


(  1  pr(wi|x) 

pr^ilx)  pr(zi ,  w\ |x) 


pr(wfc_i|x)  \ 
pr(zi,wk-i\x) 


\  pr(2:fc_i|x)  pr(^fc_i,wi|x)  •••  pr(zk-i,wk-i\x)  J 


Q(z,W )  = 


(  pr  (y\x)  pr(y:wi  |x) 

pr('l/,  z\  |x)  pr(y,  zi,wi\x) 


pr(2/,wfc_i|x)  ^ 
Pr(j/,  Z!,Wk-!  |x) 


\  pr(y,zk-i\x)  pr(y,zk-1,w1\x)  pr(y,zk-i,wk-i\x)  J 

1  pr(wi|u(1))  •••  pr(wfc_i|  u(1)) 

U  ( w ,  u )  = 

1  pr(wi|  u(k))  pr(wfc_i|  u(k)) 
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(1  pr^ilx,  M(i))  •••  pr(zfc_i|a:,«(i)) 

:  :  :  : 

1  pr(^i \x,  U(t-))  ■■■  pr(zk-i\x,u{k)) 

and  let  A(u)  =  diag{pr(i/|x,  m(1)),  ■  •  •  ,pr(y\x,u{k))}  and  M(u)  =  diag{pr(ir(1)  |x),  •  ■  ■ ,  pr(u(fe)  |x)}, 

where  diagjai, ak}  is  a  kxk  dimensional  diagonal  matrix  whose  diagonal  entries  starting  in  the 

upper  left  corner  are  a\ , . . . ,  ak . 

Assume  further  that 

(c)  both  P{z,  w)  and  Q{z,  w)  are  invertible,  and 
(d)  pr(y\x,ui),  ■  ■  ■  ,pr(y\x,uk)  take  on  distinct  values 

for  any  x  and  y.  Then,  writing  P(z,  w)  =  R(z ,  u)' M (u)U (w ,  u )  and  Q(z,  w)  =  R(z,u)' M(u)A(u)U(w,u), 
we  have 


P(z,w)  xQ(z,w)  =  {R(z,u)' M(u)U(w,u)}  1  {R(z,u)'M(u)A(u)U(w,u)} 

=  U (w,  u)-1  M (u)-1  R(z,  m),_1  R(z,u)' M (u)A(u)U (w,  u ) 

=  U(w,u)-1M(u)-1M(u)A(u)U(w,u)  =  Z7(w,u)_1  A(u)U(w,u), 

where  a  prime  notation  (/)  indicates  that  a  vector/matrix  is  transposed.  Thus,  the  recovery  problem 
of  pr(w|u)  from  U{w,u)  rests  on  solving  the  eigenvalue  problem  of  P(z,w)~1Q(z,w).  Once 
pr(w|w)  is  known,  we  can  evaluate  causal  effects  by  using  the  matrix  adjustment  method  in  Section 
2.1.  Based  on  this  consideration,  the  following  theorem  can  be  obtained: 

Theorem  1  ‘.Under  conditions  (a),(b),  (c)  and  (d),  if  U  is  a  sufficient  confounder  relative  to  an 
ordered  pair  of  variables  ( X ,  Y),  then  the  causal  effect  pr{y\do(x)}  of  X  on  Y  is  identifiable. 

The  proof  is  provided  in  Appendix. 

Here,  it  should  be  noted  that  pr (x,y,u)  is  not  identifiable  because  we  do  not  know  whether 
pr(x,  y,Ui )  =  pr(x,  y,ui holds  for*  =  1, ...,  k.  That  is,  letting  {Ai, ...,  A^}  be  a  set  of  eigenvalues 
of  P(z,w)~1Q(z,w),  we  know  that  a  set  {Ai, ...,  Xk}  of  solutions  of  \P(z,  w)~1Q(z,  w)— XI k  \  =  0 
is  consistent  with  a  set  {pr{y\x,uf),  ...,pr{y\x,uk)}  of  distributions,  but  we  do  not  know  which 
solution  of  \P(z,w)~1Q{z,w)  —  XIk  \  =  0  corresponds  to  each  pr{y\x,  uf){i  =  1  The 

causal  effect  is  nevertheless  identifiable  because  it  involves  the  summation  over  U  =  u,  not  the 
individual  solutions  of  \P(z,  w)~1Q{z,  w)  —  XIk\  =  0. 
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If,  on  the  other  hand,  we  have  knowledge  of  the  correspondence,  e.g.,  by  establishing  the  orders 
Ai  >  ...  >  Xu  and  pr(y\x,ui)  >  ...  >  pr(y\x,uu)  and  X  is  a  discrete  variable  with  a  given  finite 
number  of  categories,  then  the  condition  IF  JJ_  {Z.  Y}\U  can  be  relaxed  to  IF  JJ_  {Z,Y}\{U,X}. 
To  see  this,  when  we  replace  U (ns.  u)  by  a  k  dimensional  matrix 


Ux{w,  u) 


(  1  pr(wi|x,u(1)) 

\  1  pr(wi|ab«(fc)) 


pr(wfc-i \x,  U(!))  \ 
pr(wfc_i|x,M(fc))  J 


both  P(z,w )  =  R(z,u)' M(u)Ux{w,u)  and  Q{z,w)  =  R(z,u)'M(u)A(u)Ux(w,u)  hold.  Ifboth 
P(z,  w)  and  Q(z,w)  are  invertible,  then  using  the  steps  in  Appendix,  the  causal  effect  is  identifiable. 

This  deviation  demonstrates  that,  whenever  we  observe  two  independent  proxy  variables  asso¬ 
ciated  with  an  unmeasured  confounder,  the  distribution  of  the  latter  can  be  constructed  from  the 
proxies,  which  renders  the  causal  effect  identifiable.  Thus,  our  result  extends  the  range  of  solvable 
identification  problems  (Pearl,  2009,  Chapters.  3  and  4;  Shpitser  and  Pearl,  2006;  Tian  and  Pearl, 
2007)  to  cases  where  discrete  confounders  are  measured  with  error.  However,  it  should  be  noted 
that  the  identifiability  criteria  developed  in  (Pearl,  2009;  Shpitser  and  Pearl,  2006;  Tian  and  Pearl, 
2003)  apply  to  nonparametric  models  where  the  dimensionality  of  the  variables  is  assumed  arbi¬ 
trary,  while  our  result  applies  to  causal  models  with  finitely  discretized  confounder.  Our  method 
also  provides  guidance  on  how  to  choose  proxy  variables  so  as  to  construct  the  distribution  of  the 
unmeasured  confounders  from  the  proxies. 


4.  Effect  Restoration  in  Linear  Structural  Equation  Models 

4.1.  Linear  Structural  Equation  Model 

In  this  section,  we  assume  each  child-parent  family  in  the  graph  G  represents  a  linear  structural 
equation  model  (SEM) 

Vi  ^  ^  ncVi  Vj  cVi ,  i  1, 2, . . . ,  (11) 

'V  pail',; 

where  normal  random  disturbances  evi,ev2, . . . ,  eVn  are  assumed  to  be  independent  of  each  other 
and  have  mean  0.  In  addition,  aViVj  is  a  constant  value,  and  aViVj  (^0)  is  called  a  path  coefficient 
or  a  direct  effect.  For  the  details  on  linear  structural  equation  models,  see  Bollen  (1989). 
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The  following  notation  will  be  used  in  our  discussion.  For  univariates  X  and  V  and  a  set  Z  of 
variables,  let  axy.z  =  cov(X,  Y\Z  =  z )  and  axx.z  =  var(X|Z  =  z)  and  /3yx.z  =  axy.zjaxx.z. 
For  disjoint  sets  X,  Y  and  Z,  let  Xxy.z  be  the  conditional  covariance  matrix  of  X  and  Y  given 
Z  =  z.  In  addition,  let  Yxx.z  be  the  conditional  covariance  matrix  of  X  given  Z  =  z,  and  let 
Byx.z  =  'Yyx.zX~l.z  be  the  regression  coefficient  matrix  of  x  in  the  regression  of  Y  on  XUZ.  We 
use  the  same  notation  in  the  case  where  either  X  or  Y  is  univariate.  When  Z  is  an  empty  set,  Z 
will  be  omitted  from  the  expressions  above.  Similar  notation  is  used  for  other  parameters.  Note  the 
critical  distinction  between  ayx  and  (iyx.  The  former  are  structural  coefficients  that  convey  causal 
information,  the  latter  are  regression  coefficients  which  are  purely  statistical. 

The  total  effect  ryx  of  X  on  Y  is  defined  as  the  total  sum  of  the  products  of  the  path  coefficients 
on  the  sequence  of  arrows  along  all  directed  paths  from  X  to  Y.  ryx  can  often  be  identified  from 
graphs  using  the  back  door  criterion.  That  is,  if  a  set  S  of  observed  variables  satisfies  the  back 
door  criterion  relative  to  an  ordered  pair  of  variables  (X,  Y ),  then  the  total  effect  ryx  of  A"  on  Y  is 
identifiable,  and  is  given  by  the  regression  coefficient  /3yx.s  (Pearl,  2009). 

Another  identification  condition  invokes  an  instrumental  variable  (IV)  (Brito  and  Pearl,  2002). 
Let  {.V,  Y.  Z]  and  S  be  disjoint  subsets  of  V  in  a  directed  acyclic  graph  G.  If  a  set  SiJ{Z)  of 
variables  satisfies  (i)  S  contains  no  descendants  of  X  or  Y  in  G,  and  (ii)  S  d-separates  Z  from  Y 
but  not  from  X  in  the  graph  obtained  by  deleting  all  arrows  emerging  from  X,  then  Z  is  said  to  be 
a  conditional  instrumental  variable  (CIV)  given  S  relative  to  an  ordered  pair  of  variables  (X,  Y) 
(Pearl,  2009,  p.366;  see  also  Brito  and  Pearl,  2002).  By  CIV,  we  mean  a  variable  that  becomes  an 
instrument  relative  to  the  target  effect  upon  conditioning  on  a  set  S  of  variables.  If  an  observed 
variable  Z  is  a  CIV  given  S  relative  to  an  ordered  pair  of  variables  (X .  Y),  then  the  total  effect  ryx 
of  X  on  Y  is  identifiable,  and  is  given  by  ctyz.s/ axz.s  (Brito  and  Pearl,  2002).  Especially,  when  S 
is  an  empty  set,  Z  is  called  an  instrumental  variable  (IV)  (Bowden  and  Turkington,  1984). 

To  derive  a  new  graphical  identification  condition  for  total  effects,  we  review  some  properties 
of  the  regression  coefficients.  First,  when  { .Y.  Y}  U  S  U  T  are  normally  distributed,  we  have  the 
identity  /3yx.s  =  (3yx.st  +  Byt.xsBtx.s  (Cochran,  1938).  Second,  if  T  is  conditionally  independent 
ofX  given  S  or  Y  is  conditionally  independent  of  T  given  {  A'JUS,  then  j3yx.st  =  Pyx-s  (Wermuth, 
1989).  Third,  / 3yx.z  is  given  by  /3yx.z  =  [axy  -  Yxz'Y-^Xzy)/(oxx  -  YxzEzjEzx)  because  we 
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Y 


(d) 

Fig.4:  Linear  SEMs  with  proxy  variables  of  U,  for  the  identification  of  ryx.  (a)  requires  knowledge 
of  or^,u(Juu,  while  (b),  (c)  and  (d)  identify  ryx  from  data. 


have  <Jxy-z  —  cr xy  YixzYjzz  Yizy  and  crxx.z  —  rjxx  YxzYizz  Yjzx. 


4.2.  Identification  using  proxy  variables 

In  this  section,  we  consider  the  linear  version  of  the  problem  discussed  in  Section  3,  i.e.,  estimating 
the  total  effect  of  X  on  Y  when  a  sufficient  covariate  U  is  measured  via  proxy  variables,  as  shown 
in  Fig.  4. 

The  linear  SEM  offers  two  advantages  in  handling  measurement  errors.  First,  it  provides  a  more 
transparent  picture  into  the  role  of  each  factor  in  the  model.  Second,  there  are  quite  a  few  graphical 
structures  in  which  the  causal  effect  is  identifiable  in  linear  models  but  not  in  nonparametric  models. 

To  see  this,  consider  the  causal  diagrams  shown  in  Fig.4.  Since  U  is  sufficient  in  Fig.4,  the 
total  effect  is  identifiable  from  the  measurement  on  X,  Y  and  U,  and  is  given  by  ryx  =  /3yx.u, 
the  regression  coefficient  of  Y  on  X  and  U.  However,  if  U  is  unobserved  and  W  is  but  a  noisy 
measurement  of  U,  as  in  Fig.4  (a),  knowledge  of  the  error  mechanism  W  =  awuU  +  ew  is  needed 
in  order  to  identify  ryx  =  /3yx.u.  We  note,  however,  that  knowledge  of  both  awu  and  ouu  is  not 
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necessary;  the  product  a2wuauu  is  sufficient.  To  see  this,  we  write 


GX  U  G  V 


G xv 


fiyx'U. 


Q^wu^xu&yu 


^'iivt/Gri 


G  xx 


G xx 


2  2 
Of  (7 

^-xuu  XU 

Gun. 


and,  from  gXw  —  g  XuGLwu  and  GyW  —  GyU(xwu,  wc  hcivc 


G  x  xu  G  i! 


G  xxi 


(12) 


We  see  that,  if  o?wuauu  is  given,  ryx  is  identifiable. 

Next,  we  consider  the  identification  of  r,,T  without  external  information.  We  first  show  that 


if  U  possesses  two  independent  proxy  variables,  say  W  and  Z  (as  in  Fig.  4(b))  then  c?wuou 


is 


identifiable.  Indeed,  writing  crxw  —  OiwuOixucruu^  crwz  —  Oiwu^zu^uu  and  oxz  —  Oixucyzucruul  we 
have 


G  x  xu  G 


xxuu  wz 


{olwuOLxuG uxjl)  (ot'XVuQt'ZuG uxi) 


CX'iniiGu 


^xuGizuG  uu 


(13) 


By  substituting  equation  (13)  into  equation  (12),  we  can  see  that  Tyx(=  ayx)  is  identifiable  and  is 
given  by 


Tyx  ftyx-u 


G  xxG  wz  G  xwG  xz 


(14) 


This  result  reflects  the  well  known  fact  (e.g.,  (Bollen,  1989,  p.  224))  that,  in  linear  SEMs,  structural 
parameters  are  identifiable,  up  to  a  constant  auu,  whenever  each  latent  variable  (in  our  case  U)  has 
three  independent  proxies  (in  our  case  X,  W  and  Z).  We  see  that  the  non-identifiability  of  auu  is 
not  an  impediment  for  the  identification  of  ayx. 

We  next  relax  the  requirement  that  U  possesses  three  independent  proxies  (as  in  Fig.  4(b))  and 
consider  a  situation  as  in  Fig.  4(c),  where  two  of  these  proxies  (X  and  Z)  are  dependent.  Here  we 
note  that  {X,U}  ^-separates  Y,  Z  and  IF  from  each  other.  Therefore,  given  X,  the  tuple  Y,  Z 
and  W  work  as  three  independent  indicators  of  U  (i.e.,  Y.  Z  and  W  are  conditionally  independent 
of  each  other  given  {X,  [/}).  This  will  permit  us  to  identify  the  key  factor,  o?wuouu  from  the 
measurement  of  X,  Y,  Z  and  W,  and  obtain: 


2  _  u  yw-xu  wz-x  u  xxu 

C^wu^uu  i 

G  uz’X  G  xx 


(15) 
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The  derivation  is  as  follows.  Since  oyw.x  =  owu.xrryu.x/ouu.x, 


@WZ‘X  O WU'X® zu 


•x/^uu-x  and 


&yz'X  —  Oyu'X® zu-x  / Ouu'Xi  WC  hciVC 

Oyw'X@wz'X  WU'X® yu'X / ® uu-x^i® wu-x@ zu-x / @ uu'x')  & wu-x  q2 


vyz-x  &  yu‘X& zu-x / &  uu-x  ^  uu‘X 

Further,  noting  that  (3WU  =  awu  and  axw  =  /3wuaxu  =  awuaxu,  we  have 


Pwu'X®uu-x  (3wuOuu-x- 


OLwu(J  uu-x  OLwuO  uu 


2  2 

^■wu^xu  _  „2 


cri 


OL1,.1.CJ  uu 


Using  these  results,  equation  (15)  is  obtained.  The  first  term  of  equation  (15)  can  be  interpreted  as 
the  conditional  modified-adjustment  of  U  through  the  proxy  variable  W  given  X,  and  the  second  is 
a  correction  term,  which  transforms  the  conditional  modified-adjustment  of  U  through  W  given  X 
to  the  unconditional  modified-adjustment  of  U  through  W. 

To  derive  an  explicit  expression  for  ayx,  we  substitute  equation  (15)  into  equation  (12),  and 
using  oyw.x  —  oyw  cr xycr xw / cr xx  we  have 

&xw&yw 


&  xy  (@yw'xG  WZ'X  ®  xx  H-  O yz-x^ Xw)  @xw@  yw&xx@yz'X 

G xx(&yW'xG wz-xGxx  H”  O yz-x@ Xw)  ® xw® xx@ yz-x 


Ot-mviGll 


O xy  &y  w-x@  WZ'X& xx  +  G  yz'X® xw(@ xy® xw  Oyw@xx') 


O  ywx® WZ'X® xx 

&  xy®  yw'X®  WZ'X®  xx  &  xx@yz-x&  xw@yw‘X 


@VZ‘X  &  x 


&  yw'X®  WZ‘X@ xx 


(16) 


'  WZ‘X  V  XX 


We  see  that  ayx  =  (3yx.u  is  identifiable  and  is  given  by  equation  (14). 

From  Fig.  4  (a),  (b)  and  (c),  we  see  that  the  pivotal  quantity  needed  for  the  identification  of  ayx 
is  the  product 


QLinnQ'iiu  Ginni 


(17) 


which  stands  for  the  portion  of  aww  that  is  contributed  by  variations  of  U.  As  seen  from  the  consid¬ 
eration  above,  if  we  are  in  possession  of  several  proxies  for  U,  then  oQucruu  can  be  estimated  from 
the  data  as  in  equation  (13)  or  (15),  yielding  equation  (12).  If  however  U  has  only  one  proxy  W,  as 
Fig.4  (a),  the  product  (\q„u<7uu  must  be  estimated  externally,  using  either  a  pilot  study  or  judgmental 
assessment. 

Judgmental  assessment  of  the  product  orwuouu  can  be  made  more  meaningful  through  the  de¬ 
composition  on  the  right  hand  side  of  equation  (17),  since  both  awu  and  ew  are  causal  parameters 
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of  the  error  mechanism  W  =  awuU  +  ew,  awu  =  E(W\u)/u  measures  the  slope  with  which  the 
average  of  W  tracks  the  value  of  U,  while  aCw  £w  measures  the  dispersion  of  If'  around  that  average. 
crww  can,  of  course  be  estimated  from  the  data. 

Under  a  Gaussian  distribution  assumption,  awu  and  rs,  w  ,  w  fully  characterize  the  conditional 
density  f(w\u )  which,  according  to  Section  2,  is  sufficient  for  restoring  the  joint  distribution  of  x, 
y  and  u,  and  thus  secure  the  identification  of  the  causal  effect,  through  equation  (2).  This  explains 
why  the  estimation  of  awu  alone,  be  it  from  experimental  data  or  our  understanding  of  the  physics 
behind  the  error  process,  is  not  sufficient  for  neutralizing  the  confounder  U.  It  also  explains  why  the 
technique  of  latent  factor  analysis  (Bollen,  1989)  is  sufficient  for  identifying  causal  effects,  even 
though  it  fails  to  identify  the  factor  loading  awu  separately  of  auu. 

In  the  noiseless  case,  i.e.,  u€me„  =  0,  we  have  auu  =  aww/afu  and  equation  (12)  reduces  to: 

t  xwttyw 

&WW  _  tTyx-W  ,J  /  1  O  \ 

o  PyX'W j  trot 


xv 


where  j3yx.w  is  the  regression  coefficient  of  x  in  the  regression  model  of  V  on  X  and  W,  or: 


As  expected,  the  equality  ayx  = 
ment  for  W,  instead  of  U ;  awu 
In  the  error-prone  case,  ayx 


ftyx-w  q  E{\ 


=  Pyx-u  =  flyx-w  assures  a  bias-free  estimate  of  ayx 
plays  no  role  in  this  adjustment, 
can  be  written  as 


through  adjust- 


ftyx 


Py  wfiwx 


1  - 


PxwPv 


where  k  =  1  —  n, ) <r.w.u,  and,  as  the  formula  reveals,  ayx  cannot  be  interpreted  in  terms  of  an 
adjustment  for  a  proxy  variable  W. 

The  strategy  of  adjusting  for  a  proxy  variable  has  served  as  an  organizing  principle  for  many 
studies  in  traditional  measurement  error  analysis  (Carroll  et  al.,  2006).  For  example,  if  one  seeks 
to  estimate  the  regression  coefficient  (3XU  =  E{X\u)/u  through  a  proxy  W  of  U,  one  can  always 
choose  to  regress  X  on  another  variable,  V  ,  such  that  the  slope  of  .Y  on  V,  E(X\v) /v,  would  yield 
an  unbiased  estimate  of  [3XU.  Choosing  V  to  be  the  best  linear  estimate  of  U,  given  W  would  permit 
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such  regression.  In  our  example  of  Fig. 4  (b),  one  should  choose  V  =  7 W,  where 

_  nVw  olwuo uu 

rr  wu ;  rr  ww 

is  to  be  estimated  separately,  from  a  pilot  study.  However,  this  Two  Stage  Least  Square  strategy  is 
not  applicable  in  adjusting  for  latent  confounders;  i.e.,  there  is  no  variable  V(W)  such  that  ayx  = 

f3yx-w 

Fig.  4  (d)  represents  a  new  challenge;  although  or^uauu  is  not  identifiable,  the  total  effect 
ayx  is  nevertheless  identifiable  without  external  studies.  In  the  next  section,  we  will  discuss  this 
identification  strategy. 


4.3.  Instrumental  Variable  (IV)  method  with  a  proxy  variable 

In  Fig.4  (d),  if  U  can  be  observed,  then  both  the  CIV  condition  and  the  back  door  criterion  can  be 
applied  to  evaluating  the  total  effect  simultaneously,  giving  ryx  =  f3yx.u  and  ryx  =  oyz.u/ crxz.u, 
respectively.  We  shall  now  show  that  equating  these  two  expressions  to  each  other,  together  with 
the  independence  condition  {X,Z}  JJ_  W\U  will  allow  us  to  remove  all  terms  involving  u  as  a 
subscript.  Indeed,  starting  with  oxw  =  axuawu/auu  and  owz  =  azuawu/auu,  we  have  ozu  = 
o xu® wz / Gxw  Then,  using 


Tyx  0yx-U 


we  have 


'xx  1  1  yx  ^  xy 

G uu  . 


and,  from  TyX  —  ^yz-u/^xz-u  <uid  o~zu  —  GxuGwz/&xwi  we  have 


7~yx  & yz 


By  solving  these  equations  for  ryx,  we  obtain 


'XU!  V  UU 


&yz  & xy~ 


O x  z  O x 


Tyx  ®yz 


@wz  &xu&yu 


'  xw  u uu 


which  is  consistent  with  equation  (12).  This  derivation  demonstrates  a  more  general  approach  that 
differs  from  Cai  and  Kuroki  (2007)  which  was  based  on  latent  factor  analysis  (e.g.  Bollen,  1989; 


20 


Fig. 5:  Causal  diagram  with  unmeasured  confounders 


Stanghellini,  2004;  Stanghellini  and  Wermuth,  2005;  Vicard,  2000).  Our  approach  extends  the 
identification  conditions  to  cases  where  the  total  effect  can  not  be  identified  by  any  single  strategy 
but  by  a  combination  of  several  strategies  (e.g.,  the  back  door  criterion  and  the  CIV  condition  in  this 
case).  In  addition,  unlike  the  discussion  in  Section  4.2,  the  identification  of  (x2wvctuu  is  not  required; 
instead,  we  will  require  a  proxy  variable  W  such  that  U  c/-separates  W  from  {X .  Z  ). 

The  power  of  this  approach  can  be  demonstrated  in  the  model  of  Fig. 5  where  a  sufficient  set 
{Ui}  U  U  2  of  variables  is  unobserved.  Here,  U\  is  univariate  but  the  number  of  variables  in  U->  can 
be  uncertain.  In  this  situation,  the  back  door  criterion  can  not  be  used  to  identify  the  total  effect  of 
X  on  Y,  and  the  uncertain  number  of  variables  in  U->  prevents  us  from  identifying  the  total  effect 
based  on  latent  factor  analysis  in  which  we  need  to  know  the  number  of  unobserved  variables.  In 
addition,  because  neither  Z\  nor  Zi  is  (conditionally)  independent  of  { Uj }  U  U 2,  they  can  not  be 
used  as  the  CIVs.  Nevertheless,  we  will  show  that  the  total  effect  is  identifiable  as  follows:  Since 
both  Z\  and  Z2  are  CIV  given  U\  relative  to  an  ordered  pair  of  variables  (X.  Y ),  the  total  effect  is 
given  by 


7~lix 


@yz\'U\  @  yz2'U\ 


@XZ\‘U\  @XZ2'U\ 


Moreover,  since  {Zi,Z2}  j]_  W\U\  holds  in  the  model,  we  have  aZiW  =  a ZiU1a WU1 ) a U1U1  (i  = 
1,  2),  and  we  can  write 


Jyz  1  ‘  yx  l  ^ xz\ 

G u\u\  &  Z2W 


Ginn  O'zmn  @  zi  in  1  @  x  111  @  zn  11. 1  @  zi  in  \  ,  @  ini\@  Z2U\  I  @ xu\@ z 

7~yx  l  @ XZ2 


'  XU\  V  Z2U\  V  Z\W 

I  ,  anta,  Vyz2  - 

@ U\U\  @ Z2W  )  @11 


By  solving  these  equations  for  ryx,  we  have 


Tlix 


@  z\ y@  Z2iu  @  Z2y@  z\w 


@  Z\ X  @  Z2W  @  Z2X@  Z\W 


(19) 


We  now  summarize  these  considerations  in  a  theorem. 


Theorem  2:  Suppose  that 
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(i)  a  non-empty  set  {Z\,  Z2}  of  distinct  variables  satisfies  one  of  the  following  conditions: 
(i-a)  both  Z\  and  Z2  are  CIVs  given  a  univariate  U  relative  to  an  ordered  pair  of  variables 
( X ,  Y),  ( i-b)  Z\  is  a  CIV  given  U  relative  to  an  ordered  pair  of  variables  ( X ,  Y),  and  Zi  =  X 
and  U  satisfies  the  back  door  criterion  relative  to  an  ordered  pair  of  variables  ( X ,  Y). 

(ii)  U  d-separates  {Z 1,  Z2}  from  an  observed  variable  W. 

Then,  the  total  effect  Tyx  of  X  on  Y  is  identifiable  and  is  given  by  the  formula  (19).  □ 

5.  Conclusion 

The  paper  discusses  computational  and  representational  problems  connected  with  effect  restora¬ 
tion  when  confounders  are  mismeasured  or  misclassified.  In  particular,  we  have  explicated  how 
measurement  bias  can  be  removed  by  creating  synthetic  samples  from  empirical  samples,  and  how 
inverse-probability  weighting  can  be  modified  to  account  for  measurement  error.  These  techniques 
required  an  estimate  of  the  noise  mechanism,  which  can  be  obtained  from  external  studies  or  as¬ 
sessed  judgmentally.  Subsequently,  we  have  derived  conditions  under  which  causal  effects  can  be 
restored  without  resorting  to  external  studies,  provided  the  confounder  is  discrete  and  is  measured 
through  proxies  of  sufficiently  high  cardinality.  Finally,  we  have  analyzed  measurement  bias  in 
linear  systems  and  explicated  graphical  conditions  under  which  such  bias  can  be  removed. 

Appendix:  The  proof  of  Theorem  1 

The  proof  of  Theorem  1  is  based  on  the  following  two-step  procedure  which  recovers  pr (x,y,u) 
from  pr(a;,  y,  z,  w). 

Step  1:  Solve  an  eigenvalue  problem  of  P(z,  w)~1Q(z,  w)  to  recover  pr(w|w)  from  U ( w,u ). 
Step  2:  Recover  pr(x,  y.  u)  using  the  matrix  adjustment  method  introduced  in  Section  2. 1 . 
Step  1:  To  find  pr(w|u)  encoded  in  U(w,u ),  in  terms  of  observed  probabilities,  let  us  consider  the 
eigenvalue  problem  of  P(z,w)~1Q(z,w).  First,  noting  that  \U(w,  w)-1 1  =  l/\U(w,  w)|,  we  solve 
\P{z,  w)~1Q(z,  w)  —  \Ik  |  =  0  for  A  to  obtain  the  set  of  eigenvalues  of  P(z,w)~1Q{z,  w).  In  other 
words,  A  should  satisfy 


|  P{z,w)  1Q(z,w)  —  XIk\  =  \U(w,u)  1  A(u)U  (w ,  u)  —  XU  (w ,  u)  1U(w,u)\ 
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=  |£/(w,u)_1||A('u)  -  \Ik\\U(w,u)\  =  |A(u)  -  AJfc| 

=  (pr(y\x,U(i))  -  \)...(pr(y\x,U(k))  -  A)  =  0.  (20) 

From  this  equation,  letting  Ai  >  ...  >  A k  for  eigenvalues  of  P(z,w)~1Q(z,w),  we  have  A i  = 
pr(y\x,U(i))  ( i  =  1,  thus  the  elements  of  A (u)  are  estimable.  In  order  to  obtain  the  eigen¬ 

vector  r]i  for  A i,  letting  H  =  (r?, , ijk),  we  solve  the  following  simultaneous  linear  equations 


{ P{z,w )  1Q(z,w)  -  \ilk)r)i  =  0, 

i  =  1, 

...,k 

or,  equivalently. 

f  Al 

0  -\ 

P{z1w)~1Q{z,w)H  =  (Air/j, XkWk)  =  (Vi,-,Vk) 

0 

0  Xk  / 

(21) 


HA(u). 


Here,  it  is  noted  that  r]1,...,r)k  are  uniquely  determined  except  for  a  multiplicative  constant  be¬ 
cause  Ai,...,Afc  take  different  values  according  to  assumption  (d).  On  the  other  hand,  letting 
A  =  U(w,  u)~1E  and  E  =  diag(cti, a*.)  for  any  non-zero  values  of  ai, au,  we  have 


{P{z,w)~1Q(z,w)}  A  =  {U{w,u)-1A(u)U{w,u)}{U{w,u)-1E} 

=  =  U(w,u)~1EA(u)  =  AA(u), 


This  means  that  A  is  also  a  matrix  from  eigenvectors  of  P(x,  z)~1Q(x,  z)  and  we  have  A(  = 
U(w,u)~1E)  =  H  by  taking  certain  values  of  ai,...,au-  Then,  for  the  inverse  El -1  =  (h,j) 
of  the  estimable  matrix  H ,  we  have  using  U (w ,u)_1  E  =  H, 


U(w,u)  = 

/  1  pr(wi|ua))  . 

.  pr(wfc_i|u(1))  \ 

=  EH-1  = 

f  a\hn 

a\hik  \ 

^  1  pr(wi|  u{k))  . 

.  pr(wk-i\u{k))  y/ 

\  Oikhk i 

.  ..OLkhkk  / 

Equating  the  first  column  of  both  hand  sides  of  the  equation,  the  diagonal  element  a\  =  l//in, . . . ,  a 
1  jhk i  of  E  can  be  obtained,  which  indicates  that  U (w,u)  is  identifiable  from  EH -1,  since  H~x  is 
estimable.  Thus,  every  element  pr(w|u)  of  U(w,u )  can  be  obtained. 

Step  2:To  express  pr (x,y,u)  in  terms  of  observed  probabilities,  we  use  the  matrix  adjustment 
method  introduced  in  Section  2. 1 .  Since  we  have 

k  k 

pr (x,y,w)  =  ^2pr(x,y,Ui)pr{w\ui)  =  ^  pr(x,  y,  u(i))pr(w|uw), 

i=  1  i=  1 
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substitute  elements  of  pr(u>j|u(j))  ( i,j  =  1, k)  obtained  in  Step  1  for  M(w,  u)  inequation  (5). 
Then,  if  M(w,  u)  is  invertible,  we  can  obtain  elements  of  Vxy(u).  Thus,  the  causal  effect 


is  identifiable. 

Acknowledgement 

This  research  was  funded  in  part  by  the  Ministry  of  Education,  Culture,  Sports,  Science  and  Tech¬ 
nology  of  Japan,  the  Asahi  Glass  Foundation,  Office  of  Naval  Research  (ONR),  National  Institutes 

of  Health  (NIH),  and  National  Science  Foundation  (NSF). 

References 

Bollen,  K.  A.  (1989).  Structural  equations  with  latent  variables.  John  Wiley  &  Sons. 

Bowden,  R.  J.,  and  Turkington,  D.  A.  (1984).  Instrumental  variables.  Cambridge  University  Press. 

Brito,  C.  and  Pearl,  J.  (2002).  Generalized  instrumental  variables.  Proceeding  of  the  18th  Confer¬ 
ence  on  Uncertainty  in  Artificial  Intelligence,  85-93. 

Cai,  Z.  and  Kuroki,  M.  (2008).  On  identifying  total  effects  in  the  presence  of  latent  variables  and  se¬ 
lection  bias.  Proceedings  of  the  Twenty-Fourth  Conference  Annual  Conference  on  Uncertainty 
in  Artificial  Intelligence,  62-69. 

Carroll,  R.,  Ruppert,  D.,  Stefanski,  L.  and  Crainiceanu,  C.  (2006).  Measurement  error  in  nonlinear 
Models:  A  modern  perspective.  2nd  ed.  Chapman  &  Hall/CRC,  Boca  Raton,  FF. 

Cochran,  W.  G.  (1938).  The  omission  or  addition  of  an  independent  variate  in  multiple  linear 
regression.  Supplement  to  the  Journal  of  the  Royal  Statistical  Society,  5,  171-176. 

Goetghebeur,  E.  and  Vansteelandt,  S.  (2005).  Structural  mean  models  for  compliance  analysis  in 
randomized  clinical  trials  and  the  impact  of  errors  on  measures  of  exposure.  Statistical  Methods 
in  Medical  Research,  14,  397-415. 

Greenland,  S.  (2005).  Multiple -bias  modeling  for  analysis  of  observational  data.  Journal  of  the 
Royal  Statistical  Society:  Series  A,  168,  267-306. 


24 


Greenland,  S.  and  Kleinbaum,  D.  (1983).  Correcting  for  misclassification  in  two-way  tables  and 
matched-pair  studies.  International  Journal  of  Epidemiology,  12,  93-97 . 

Greenland,  S.  and  Lash,  T.  (2008).  Bias  analysis.  In  Modern  epidemiology  (K.  Rothman,  S. 
Greenland  and  T.  Lash,  eds.),  3rd  ed.  Lippincott  Williams  and  Wilkins,  Philadelphia,  PA,  345- 
380. 

Hernan,  M.  and  Cole,  S.  (2009).  Invited  commentary:  Causal  diagrams  and  measurement  bias. 
American  Journal  of  Epidemiology,  170,  959-962. 

Pearl,  J.  (1988).  Probabilistic  reasoning  in  intelligent  systems.  Morgan  Kaufmann,  San  Mateo,  CA. 

Pearl,  J.  (2009).  Causality:  Models,  reasoning,  and  inference.  2nd  ed..  Cambridge  University  Press. 

Pearl,  J.  (2010).  On  measurement  bias  in  causal  inference.  Proceedings  of  the  Twenty-Sixth  Confer¬ 
ence  on  Uncertainty  in  Artificial  Intelligence,  425-432. 

Pearl,  J.  and  Bareinboim,  E.  (2011).  Transportability  across  studies:  A  formal  approach.  In  Pro¬ 
ceedings  of  the  25th  AAAI  Conference  on  Artificial  Intelligence,  247-254. 

Rosenbaum,  P.  R.  and  Rubin,  D.  B.  (1983).  The  central  role  of  the  propensity  score  in  observational 
studies  for  causal  effects.  Biometrika,  70,  41-55. 

Schneeweiss,  S.,  Rassen,  J.,  Glynn,  R.,  Avorn,  J.,  Mogun,  H.  and  Brookhart,  M.  (2009).  Highdi¬ 
mensional  propensity  score  adjustment  in  studies  of  treatment  effects  using  health  care  claims 
data.  Epidemiology,  20,  512-522. 

Selen,  J.  (1986).  Adjusting  for  errors  in  classification  and  measurement  in  the  analysis  of  partly  and 
purely  categorical  data.  Journal  of  the  American  Statistical  Association,  81,  75-81. 

Shpitser.I.  and  Pearl,  I.  (2006).  Identification  of  joint  interventional  distributions  in  recursive  semi- 
Markovian  causal  models.  Proceedings  of  the  National  Conference  on  Artificial  Intelligence, 
21,  1219-1226. 

Stanghellini,  E.  (2004).  Instrumental  variables  in  Gaussian  directed  acyclic  graph  models  with  an 
unobserved  confounder.  Environme tries,  15,  463-469. 

Stanghellini,  E.  and  Wermuth,  N.  (2005).  On  the  identification  of  directed  acyclic  graph  models 
with  one  hidden  variable.  Biometrika,  92,  337-350. 

Stunner,  T.,  Schneeweiss,  S.,  Avorn,  I.  and  Glynn,  R.  (2005).  Adjusting  effect  estimates  for  unmea¬ 
sured  confounding  in  cohort  studies  with  validation  studies  using  propensity  score  calibration. 


Measurement  Bias  and  Effect  Restoration  in  Causal  inference  25 
American  Journal  of  Epidemiology,  162,  279-289. 

Tian,  J.  and  Pearl,  J.  (2003).  On  the  identification  of  causal  effects.  UCLA  Cognitive  Systems 
Laboratory,  Technical  Report  (R-290-L). 

Vicard,  P.  (2000).  On  the  identification  of  a  single  factor  model  with  correlated  residuals.  Biometrika, 
87,  199-205. 

Wermuth,  N.  (1989).  Moderating  effects  in  multivariate  normal  distributions.  Metliodika,  3,  74-93. 


